181 research outputs found

    A Corpus Approach to Roman Law Based on Justinian’s Digest

    Get PDF
    Traditional philological methods in Roman legal scholarship such as close reading and strict juristic reasoning have analysed law in extraordinary detail. Such methods, however, have paid less attention to the empirical characteristics of legal texts and occasionally projected an abstract framework onto the sources. The paper presents a series of computer-assisted methods to open new frontiers of inquiry. Using a Python coding environment, we have built a relational database of the Latin text of the Digest, a historical sourcebook of Roman law compiled under the order of Emperor Justinian in 533 CE. Subsequently, we investigated the structure of Roman law by automatically clustering the sections of the Digest according to their linguistic profile. Finally, we explored the characteristics of Roman legal language according to the principles and methods of computational distributional semantics. Our research has discovered an empirical structure of Roman law which arises from the sources themselves and complements the dominant scholarly assumption that Roman law rests on abstract structures. By building and comparing Latin word embeddings models, we were also able to detect a semantic split in words with general and legal sense. These investigations point to a practical focus in Roman law which is consistent with the view that ancient law schools were more interested in training lawyers for practice rather than in philosophical neatness.</jats:p

    The relationship between usage and citations in an open access mega-journal

    Get PDF
    Abstract: How do the level of usage of an article, the timeframe of its usage and its subject area relate to the number of citations it accrues? This paper aims to answer this question through an observational study of usage and citation data collected about the multidisciplinary, open access mega-journal Scientific Reports. This observational study answers these questions using the following methods: an overlap analysis of most read and top-cited articles; Spearman correlation tests between total citation counts over two years and usage over various timeframes; a comparison of first months of citation for most read and all articles; a Wilcoxon test on the distribution of total citations of early cited articles and the distribution of total citations of all other articles. All analyses were performed in using the programming language R. As Scientific Reports is a multidisciplinary journal covering all natural and clinical sciences, we also looked at the differences across subjects. We found a moderate correlation between usage in the first year and citations in the first two years since publication (Spearman correlation coefficient of 0.49, α = 0.05), and that articles with high usage in the first six months are more likely to have their first citation earlier (Wilcoxon = 1,811,500, p < 0.0001), which is also related to higher citations in the first two years (Wilcoxon = 8,071,200, p < 0.0001). As this final assertion is inferred based on the results of the other elements of this paper, it would require further analysis

    Uptake and outcome of manuscripts in Nature journals by review model and author characteristics

    Get PDF
    Abstract Background Double-blind peer review has been proposed as a possible solution to avoid implicit referee bias in academic publishing. The aims of this study are to analyse the demographics of corresponding authors choosing double-blind peer review and to identify differences in the editorial outcome of manuscripts depending on their review model. Methods Data includes 128,454 manuscripts received between March 2015 and February 2017 by 25 Nature-branded journals. We investigated the uptake of double-blind review in relation to journal tier, as well as gender, country, and institutional prestige of the corresponding author. We then studied the manuscripts’ editorial outcome in relation to review model and author’s characteristics. The gender (male, female, or NA) of the corresponding authors was determined from their first name using a third-party service (Gender API). The prestige of the corresponding author’s institutions was measured from the data of the Global Research Identifier Database (GRID) by dividing institutions in three prestige groups with reference to the 2016 Times Higher Education (THE) ranking. We employed descriptive statistics for data exploration, and we tested our hypotheses using Pearson’s chi-square and binomial tests. We also performed logistic regression modelling with author update, out-to-review, and acceptance as response, and journal tier, author gender, author country, and institution as predictors. Results Author uptake for double-blind submissions was 12% (12,631 out of 106,373). We found a small but significant association between journal tier and review type (p value < 0.001, Cramer’s V = 0.054, df = 2). We had gender information for 50,533 corresponding authors and found no statistically significant difference in the distribution of peer review model between males and females (p value = 0.6179). We had 58,920 records with normalised institutions and a THE rank, and we found that corresponding authors from the less prestigious institutions are more likely to choose double-blind review (p value < 0.001, df = 2, Cramer’s V = 0.106). In the ten countries with the highest number of submissions, we found a large significant association between country and review type (p value < 0.001, df = 10, Cramer’s V = 0.189). The outcome both at first decision and post review is significantly more negative (i.e. a higher likelihood for rejection) for double-blind than single-blind papers (p value < 0.001, df = 1, Cramer’s V = 0.112 for first decision; p value < 0.001; df = 1, Cramer’s V = 0.082 for post-review decision). Conclusions The proportion of authors that choose double-blind review is higher when they submit to more prestigious journals, they are affiliated with less prestigious institutions, or they are from specific countries; the double-blind option is also linked to less successful editorial outcomes

    How Data Papers Present a Unique Contribution To Open Research In The Humanities And Social Sciences

    Get PDF
    The open research movement and initiatives like the FAIR principles have been critical in establishing the importance of data in research, particularly within the sciences. Alongside the sciences, attention to openly available data in Humanities and Social Sciences (HSS) research has gradually grown. This growth is largely attributed to the increased availability of digital collections, the development of new data-intensive methods, an increasingly solid infrastructure, increased pressure from funders, the requirement of data management plans for preservation purposes, and the involvement of research libraries in data curation. In this context, attention to how data is produced, how it is openly and transparently shared, and how it can be reused has generated great interest, accompanied by an inevitable need for reputable data sharing outlets. One such outlet is the data paper – a peer-reviewed publication that focuses on describing a curated dataset. Data papers can be shared in traditional research journals as one subtype of article publication, or, more recently, in data journals which are dedicated to the publication of data papers. This presentation focuses on the work done by the open access Journal of Open Humanities (JOHD) in promoting the practice of publishing data papers with their accompanying open access datasets. JOHD was established with Ubiquity Press in 2015 to promote awareness, use, and reuse of humanities data. JOHD data papers promote the comprehensive description of how a dataset was assembled, where it may be accessed, and any crucial context including the research questions that framed the data gathering, including limitations to the original methods or scope of sources included. JOHD data papers suggest potential future reuses of data, which recent analytics seem to suggest has helped increase the visibility of datasets, and therefore their research impact (Marongiu et al., forthcoming; McGillivray et al., 2022). In addition, an overview of the three key elements (the “golden triangle”) that assess the impact of open research efforts as represented by different research outputs (datasets, data papers and research papers) will be presented, along with proposed initiatives for linking these. In doing so, we aim to (a) find a programmatic way to identify these links by extracting information from available metadata of datasets and verifying their accuracy, and (b) create a “ground truth” in a manual and/or machine-assisted way which would enable the training of more sophisticated NLP-based methods as a next step. We hope to illustrate the importance of including data papers into the research conversation given that they present a unique contribution to addressing global challenges within the open research arena

    A computational approach to Latin verbs: new resources and methods

    Get PDF
    Questa tesi presenta l'applicazione di metodi computazionali allo studio dei verbi latini. In particolare, mostriamo la creazione di un lessico di sottocategorizzazione estratto automaticamente da corpora annotati; inoltre presentiamo un modello probabilistico per l'acquisizione di preferenze di selezione a partire da corpora annotati e da un'ontologia (Latin WordNet). Infine, descriviamo i risultati di uno studio diacronico e quantitativo sui preverbi spaziali latini

    Emo, love and god: making sense of Urban Dictionary, a crowd-sourced online dictionary

    Get PDF
    The Internet facilitates large-scale collaborative projects and the emergence of Web 2.0 platforms, where producers and consumers of content unify, has drastically changed the information market. On the one hand, the promise of the ‘wisdom of the crowd’ has inspired successful projects such as Wikipedia, which has become the primary source of crowd-based information in many languages. On the other hand, the decentralized and often unmonitored environment of such projects may make them susceptible to low-quality content. In this work, we focus on Urban Dictionary, a crowd-sourced online dictionary. We combine computational methods with qualitative annotation and shed light on the overall features of Urban Dictionary in terms of growth, coverage and types of content. We measure a high presence of opinion-focused entries, as opposed to the meaning-focused entries that we expect from traditional dictionaries. Furthermore, Urban Dictionary covers many informal, unfamiliar words as well as proper nouns. Urban Dictionary also contains offensive content, but highly offensive content tends to receive lower scores through the dictionary’s voting system. The low threshold to include new material in Urban Dictionary enables quick recording of new words and new meanings, but the resulting heterogeneous content can pose challenges in using Urban Dictionary as a source to study language innovation

    Unsupervised Acquisition of Verb Subcategorization Frames from Shallow-Parsed Corpora

    Get PDF
    In this paper, we reported experiments of unsupervised automatic acquisition of Italian and English verb subcategorization frames (SCFs) from general and domain corpora. The proposed technique operates on syntactically shallow-parsed corpora on the basis of a limited number of search heuristics not relying on any previous lexico-syntactic knowledge about SCFs. Although preliminary, reported results are in line with state-of-the-art lexical acquisition systems. The issue of whether verbs sharing similar SCFs distributions happen to share similar semantic properties as well was also explored by clustering verbs that share frames with the same distribution using the Minimum Description Length Principle (MDL). First experiments in this direction were carried out on Italian verbs with encouraging results
    corecore